De Novo Genome Assembly ◾ 93
3.2 EXAMPLES OF DE NOVO ASSEMBLERS
The above is a brief definition for the three kinds of the algorithms for the de novo genome
assembly. For the algorithms themselves, you may need to refer to an algorithm book.
In the following, we will discuss some example assemblers to show you how the de novo
genome assembly is performed. There are many papers that reported the comparative per-
formance of these assemblers and others. Readers can refer to those papers to see the dif-
ferences found by researchers.
3.2.1 ABySS
ABySS (Assembly By Short Sequences) [8] is a parallel de novo genome assembler devel-
oped to assemble very large data of short reads produced by NGS technologies. It per-
forms assembly in two stages. First, it generates all possible k-mers from the reads, removes
potential errors, and builds contigs using de Bruijn graphs. Second, it uses mate-pair infor-
mation to extend contigs benefiting from contig overlaps and merges the unambiguously
connected graph nodes. ABySS can be used in two modes: bloom filter mode, which uses
hashing, and MPI mode, which uses message passing interface (MPI) to parallelize the de
novo assembly. It is recommended to use the bloom filter mode over the legacy MPI because
it reduces the memory usage to 10 folds. ABySS can be installed following the instructions
available at “https://github.com/bcgsc/abyss”. On Ubuntu, we can install it using “sudo
apt-get install abyss”. Once ABySS has been installed, the “abyss-pe” command can be
FIGURE 3.5 De Bruijn graphs.